The SCAM Approach to Copy Detection in Digital Libraries

نویسندگان

  • Narayanan Shivakumar
  • Hector Garcia-Molina
چکیده

Scenario 1 Your local publishing company Books'R'Us decides to publish on the Internet its latest book in an effort to cut down on printing costs and book distribution expenses. Customers pay for the digital books using sophisticated electronic payment mechanisms such as DigiCash, First Virtual or InterPay. When the payment is received, the book distribution server at Books'R'Us sends a digital version of the book electronically to the paying customer. Books'R'Us expects to make higher profits on the digital book due to lower production and distribution costs, and larger markets on the Internet. It turns out, however, that very few books are sold since digital copies of the Books'R'Us book had been widely circulated on UseNet newsgroups, bulletin boards, and had been available for free on alternate ftp sites and Web servers. Books'R'Us retract their digital publishing commitment blaming the ease of re-transmission of digital items on the Internet, and return to traditional paper based publishing. Scenario 2 Sheng wants to buy a new Pentium portable, and hence wants to read articles on the different brands available and their reviews before choosing a brand to buy. She searches information services like Dialog, Lycos, Gloss and Webcrawler, and follows UseNet newsgroups to find articles on the different portables available and finds nearly 1500 articles. When she starts reading the articles, she finds that most articles are really duplicates or near-duplicates of one another and did not contribute any new information to her search. She realizes this is because most databases maintain their own local copies of different articles in perhaps different formats (Word, Postscript, HTML), or have perhaps mirror sites that contain the same set of articles. Sheng then trudges through the articles one-by-one wishing that somebody would build a system that can remove exact or near-duplicates automatically so that she only needs to read each distinct article. Around article number 150, Sheng decides not to buy a certain brand since from the articles she learns that that brand had had problems with its color display since its release. But she has to

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of Copy-Move Forgery in Digital Images Using Scale Invariant Feature Transform Algorithm and the Spearman Relationship

Increased popularity of digital media and image editing software has led to the spread of multimedia content forgery for various purposes. Undoubtedly, law and forensic medicine experts require trustworthy and non-forged images to enforce rights. Copy-move forgery is the most common type of manipulation of digital images. Copy-move forgery is used to hide an area of the image or to repeat a por...

متن کامل

SCAM: A Copy Detection Mechanism for Digital Documents

Copy detection in Digital Libraries may provide the necessary guarantees for publishers and newsfeed services to offer valuable on-line data. We consider the case for a registration server that maintains registered documents against which new documents can be checked for overlap. In this paper we present a new scheme for detecting copies based on comparing the word frequency occurrences of the ...

متن کامل

شاخص های طراحی و ارزیابی کتابخانه های دیجیتالی

Introduction: There was always suspicion regarding concept and frameworks of digital libraries concepts such as electronic library, virtual library, without wall library, hybrid library and digital library have applied often together, or for each other for conveying library concept. Studies have shown that so far there is no standard and universal accepted definition for digital libraries, howe...

متن کامل

Study & Evaluation of Document Comparing Mechanisms

Digital libraries have made access to documents very easy but this also makes documents vulnerable to being copied. The illegal distribution of documents discourages authors/ news feed services to share their information. Hence it is extremely essential to protect intellectual property. In this paper we have looked at the earliest comparison methods, namely string comparison algorithms that com...

متن کامل

A symbol-based fuzzy decision-making approach to evaluate the user satisfaction on services in academic digital libraries

Academic libraries play a significant role in providing core services that include research, teaching and learning. Usersatisfaction is an important indicator for evaluating the performance of library service. This paper develops a methodfor measuring the user satisfaction in a group decision-making environment. First, the performance of service isevaluated by using questionnaire survey. The sc...

متن کامل

Organizing News Archives by Near-Duplicate Copy Detection in Digital Libraries

There are huge numbers of documents in digital libraries. How to effectively organize these documents so that humans can easily browse or reference is a challenging task. Existing classification methods and chronological or geographical ordering only provide partial views of the news articles. The relationships among news articles might not be easily grasped. In this paper, we propose a near-du...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • D-Lib Magazine

دوره 1  شماره 

صفحات  -

تاریخ انتشار 1995